An LLM safety monitor's evaluator can be tricked into clearing dangerous sessions when the attacker plants fake analysis text in the monitored conversation. Experimental results, defense limits, and structural separation points.
Composio publishes security analysis of OpenClaw. Approximately 7.1% of SkillHub-distributed skills were found to have critical vulnerabilities, leaving over 30,000 instances exposed to the internet in the early stages at risk of prompt injection and credential theft.
AI Security for Apps reached GA, letting Cloudflare block prompt injection and PII leaks at the WAF layer. On the same day, it also launched RFC 9457-compatible error responses that replace HTML with JSON or Markdown when AI agents hit Cloudflare errors.
A prompt-injection attack in a GitHub issue title tricked an AI triage bot into stealing npm tokens, which were then used to publish a malicious package in a five-step supply-chain attack chain.
Techniques and defenses from the MINJA, InjecMEM, and ToxicSkills campaigns that poison AI agents’ memory files, and the fact that GPT-5.3-Codex achieved a 72% exploit success rate on EVMbench released by OpenAI and Paradigm. This article organizes how AI becomes both a target of attacks and a weapon for attackers.